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1 6. A sound source identifying and separating apparatus, characterized in that it comprises : 
a sound collecting means including a pair of sound collecting microphones juxtaposed with 

each other across a preselected spacing and opposed to a plurality of sound sources, said two 

microphones each individually capturing mixed sounds from said sound sources therewith; 

an imaging means and/or a sensing means, said imaging means being adapted to 

consecutively image objects that can be said sound sources, said sensing means sensing directions 

in which said objects possibly being said sound sources are located; 

a sound processing means for determining the directions of all said sound sources based on 

sound information for a difference between phases and a difference between intensities which each 

of said mixed sounds from said sound sources has when captured by said two sound collecting 

microphones, respectively; 

an image processing means for determining the direction of each of said objects possibly 

being said sound sources, from information for image pictures imaged by said imaging means and/ or 

directional information for each of said objects sensed by said sensing means; 
directional filters; and 

a control means for controlling operations of said sound collecting means, said imaging 
means and/or said sensing means, said image processing means, and said sound processing means, 

wherein the operations of said sound collecting means, said imaging means and/or said 
sensing means, said image processing means, and said sound processing means, are so controlled 
by said control means that: 



Hiroshi OKUNO et al Docket No. 01 1583 

said sound processing means predetermines rough directions of said sound sources from 
information for said sounds captured by said sound collecting means, and said image processing 
means determines the direction of each of said objects possibly being said sound sources within a 
range defined by said predetermined rough directions; or 

said image processing means predetermines directions of said sound sources only from 
information for image pictures imaged by said imaging means and/or directional information for 
each of said objects sensed by said sensing means, and said sound processing means determines the 
directions of said sound sources within a range of angles defined by said predetermined directions; 
or 

said sound processing means predetermines rough directions of said sound sources only from 
information for said sounds captured by said sound collecting means, and said sound processing 
means selects said directional filters corresponding to said predetermined directions of said sound 
sources, 

whereby it is made possible to identify the directions of all said sound sources and to separate 
them from one another even if neighboring ones of them lie close by. 

17. A sound source identifying and separating apparatus as set froth in claim 16, 
characterized in that said sensing means is adapted to sense said objects possibly being said sound 
sources in response to magnetism thereof 
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18. A sound source identifying and separating apparatus as set forth in claim 16, 
characterized in that said sensing means is adapted to sense said objects possibly being said sound 
sources in response to infrared rays that they emit. 

19. A sound source identifying and separating apparatus as set forth in claim 16 or claim 17, 
characterized in that said objects possibly being said sound sources have each a magnetic carrying 
material attached thereto. 

20. A sound source identifying and separating apparatus as set forth in any one of claim 
16 to claim 18, characterized in that said image processing means has a function to determine 
direction of all said objects possibly being said sound sources on the basis of a color of a said object. 

21. A sound source identifying and separating apparatus as set forth in any one of claim 
16 to claim 18, characterized in that said image processing means has a function to determine 
directions of all said objects possibly being said sound sources on the basis of a shape of a said 
object. 

22. A sound source identifying and separating apparatus as set forth in any one of claim 
16 to claim 18, characterized in that said image processing means has a function to determine 
direction of all said objects possibly being said sound sources on the basis of a color, a shape and a 
height together of a said object. 

4 



Hiroshi OKUNO et al. Docket No. 01 1 583 

23 . A sound source identifying and separating method, characterized in that it comprises: 
a first step of capturing mixed sounds fi*om a plurality of sound sources with a pair of sound 

collecting microphones juxtaposed with each other across a preselected spacing and opposed to the 

sound sources, said two sound collecting microphones each individually capturing said mixed sounds 

fi*om said sound sources; 

conducted concurrently with the first step, a second step in which an imaging means 

consecutively images obj ects that can be said sound sources to produce image pictures thereof and/or 

a sensing means senses directions in which said objects are located; 

a third step in which a sound processing means determines a rough direction of each of all 

said sound sources fi-om sound information for said mixed sounds captured in the first step and on 

the basis of information in said sound information for a difference between phases and a difference 

between intensities; 

a fourth step in which an image processing means determines a direction of each of said 
objects possibly being all said sound sources fi*om information for the image pictures produced 
and/or information for the direction sensed in the second step ^ within a range defined by such rough 
directions determined in the third step; 

a fifl;h step in which said sound processing means determines a direction of each of all said 
sound sources on the basis of said sound information for a difference between phases and a 
difference between intensities, within a range of angles defined by such directions determined in the 
fourth step; 
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a sixth step in which said sound processing means selects a particular directional filters in 
accordance with the direction determined in the fifth step of each of all said sound sources to 
separate all said sound sources from one another; 

a seventh step in which said image processing means determines a direction of each of all 
said objects possibly being said sound sources on the basis of information for the image pictures 
produced and/or information for the direction sensed by said sensing means in the second step, and 
said sound processing means determines a direction of each of all said sound sources on the basis 
of said sound information for a difference between phases and a difference between intensities as 
aforesaid within a range of angles defined by thus determined directions, and selects a particular 
directional filter in accordance with the thus determined direction of each of all said source to 
separate all said sound sources fi*om one another; and 

an eight step in which said sound processing means selects such particular filters in 
accordance with such rough directions determined in the third step to separate all said sound sources 
from one another. 

24. A sound source identifying and separating method as set forth in claim 22, 
characterized in that the direction sensing by said sensing means is effected in response to an infrared 
ray. 
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25. A sound source identifying and separating method as set forth in claim 22, 
characterized in that the direction sensing by said sensing means is effected in response to 
magnetism. 

26. A sound source identifying and separating method as set forth in claim 23, 
characterized in that the direction of each of all said objects possibly being said sound sources is 
determined by said image processing means on the basis of a color thereof. 

27. A sound source identifying and separating method as set forth in claim 23, 
characterized in that the direction of each of all said objects possibly being said sound sources is 
determined by said image processing means on the basis of a shape thereof. 

28. A soimd source identifying and separating method as set forth in claim 23, 
characterized in that the direction of each of all said objects possibly being said sound sources is 
determined by said image processing means on the basis of a color, a shape and a height thereof. 

29. A sound source identifying method as set forth in claim 23, characterized in that 
determination of the direction of each of all said sound sources by said sound processing means on 
the basis of sound information for a difference between phases and a difference between intensities 
is effected by determining a position of each of said sources on the basis of a signal for each of 
frequency bands arbitrarily divided into. 
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30. A sound source identifying method as set forth in claim 23, characterized in that said 
position information of a said object possibly being a said sound source is derived from amovement 
of said object. 
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REMARKS 



The above amendment is submitted to place the claims in substantially the same conditions 
as to the claims which have been amended under Article 34 in the international application and to 
remove improper multiple dependency of the claims. An English translation of the annexes of the 
PCT international preliminary examination report is enclosed. Original claims and amended claims 
1-15 have been cancelled and new claims 16-30 have been added. Early and favorable action is 
awaited. 

In the event there are any additional fees required, please charge our Deposit Account No. 



01-2340. 



Respectfully submitted, 
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McLELAND & NAUGHTON, LLP 
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Suite 1000 
1725 K Street, N.W. 
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SPECTFICATION 

Sojind Source Identi fying and Separating Appajra taiLiuajdJ^^ 

5 Technical Field 

The present invention relates to a sound source identifying 
apparatus and method for identifying various sounds individually based on 
image information and sound information derived from a plurality of such 
sound sources. 

10 

Background Art 

Researches have so far been undertaken to separate from mixed 
sounds a particular sound such as a voice or a music sound included in the 
mixed sounds. For example, a sound recognition system has been known 
15 that assumes its input sound to be a speech or voices. And, insofar as image 
or image processing is concerned, a system has been known which in 
educing an object assumes its color, shape and/or movement to characterize 
it. 

There has so far been no sound recognition system, however, that 
20 associates sound recognition with image processing. On the other hand, the 
system assuming a speech or voices is only effectuated when a microphone 
is near the mouth or where there is no other sound source. 

Further, while there is a system proposed to separate based on a 
harmonic structure, a particular sound signal from those from a plurality of 
25 sound sources and then to find the direction in which its sound source is 
located, the accuracy with which the direction of the sound source can be 
found thereby is as rough as ±10° , and it is not possible to separate the 
sound source if it lies close to an adjacent sound source or sources. 

There has also been proposed a method that uses a plurality of 
30 sound collecting microphones the same in number as sound source and, 
based on sound information from the various sound collecting microphones, 
to identify a particular sound source. While this method is designed to 
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identify the intensity of a sound and the position of its source, its 
frequency information comes to spread about the axis defining the 
direction in which the sound source is located, thereby making it difficult 
to finely identify the sound source. Further, while this method makes it 
5 possible to increase the rate of recognition of a sound source, the 
requirement for sound collecting microphones the same in number as sound 
sources existing independently of one another makes the method costly. 

Aimed to obviate the difficulties entailed in the prior art as 
described above, the present invention has for its first object to provide a 

10 sound source identifying apparatus that is capable of identifying an object 
as a source of a sound in mixed sounds in terms of its location with greater 
accuracy by using both information as to the sound and information as to 
the sound source as an image thereof and using information as to that 
position to separate the sound from the mixed sounds with due accuracy. 

15 The present invention further has for its second object to provide a 

sound source identifying method that is capable of identifying an object as 
a source of a sound in mixed sounds in terms of its position with greater 
accuracy by using both information as to the sound and information as to 
the sound source as an image thereof and using information as to that 

2 0 position to isolate the sound from the mixed sounds with due accuracy. 

Disclosure of the Invention 

In order to achieve the first object mentioned above, there is 
provided in accordance with the present invention a sound source 

25 identifying and separating apparatus, which apparatus is characterized in 
that it comprises: a sound collecting means including a pair of sound 
collecting microphones juxtaposed with each other across a preselected 
spacing and opposed to a plurality of sound sources, the said two 
microphones each individually capturing mixed sounds from the said sound 

30 sources therewith; an imaging means and/or a sensing means, the said 
imaging means being adapted to consecutively image objects that can be 
said sound sources, the said sensing means sensing directions in which said 
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objects possibly being said sound sources are located; a sound processing 
means for determining the directions of all the said sound sources based on 
sound information for a difference between phases and a difference 
between intensities which each of said mixed sounds from the said sound 
5 sources has when captured by the said two sound collecting microphones, 
respectively; an image processing means for determining the direction of 
each of the said objects possibly being the said sound sources, from 
information for image pictures imaged by the said imaging means and/or 
directional information for each of the said objects sensed by the said 

10 sensing means; directional filters; and a control means for controlling 
operations of the said sound collecting means, the said imaging means 
and/or the said sensing means, said image processing means, and said 
sound processing means, wherein the operations of the said sound 
collecting means, the said imaging means and/or the said sensing means, 

15 the said image processing means, and the said sound processing means, are 
so controlled by the said control means that: the said sound processing 
means predetermines rough directions of the said sound sources from 
information for the said sounds captured by the said sound collecting 
means, and the said image processing means determines the direction of 

20 each of the said objects possibly being the said sound sources within a 
range defined by the said predetermined rough directions; or the said image 
processing means predetermines directions of the said sound sources only 
from information for image pictures imaged by the said imaging means 
and/or directional information for each of the said objects sensed by the 

25 said sensing means, and the said sound processing means determines the 
directions of said sound sources within a range of angles defined by the 
said predetermined directions; or the said sound processing means 
predetermines rough directions of the said sound sources only from 
information for the said sounds captured by the said sound collecting 

30 means, and the said sound processing means selects the said directional 
filters corresponding to the said predetermined directions of the said sound 
sources, whereby it is made possible to identify the directions of all the 
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said sound sources and to separate them from one another even if 
neighboring ones of them lie close by. Also, the apparatus is characterized 
in that the direction of each of all the said objects possibly being the said 
sound sources is determined by the said image processing means on the 
5 basis of any one or more or all of a color, a shape and a height thereof. 

Also, the apparatus is characterized in that the said sensing mans is 
adapted to sense the said objects possibly being the said sound sources in 
response to magnetism thereof 

Also, the apparatus is characterized in that the said sensing mans is 
10 adapted to sense the said objects possibly being the said sound sources in 
response to infrared rays that they emit. 

Also, the apparatus is characterized in that the said objects possibly 
being said sound sources have each a magnetic carrying material attached 
thereto, 

15 Also, the apparatus is characterized in that the said image 

processing means has a function to determine direction of all said objects 
possibly being said sound sources on the basis of a color of a said object. 

Also, the apparatus is characterized in that the said image 
processing means has a function to determine direction of all said objects 

2 0 possibly being said sound sources on the basis of a shape of a said object. 

Also, the apparatus is characterized in that the said image 
processing means has a function to determine directions of all said objects 
possibly being said sound sources on the basis of a color, a shape and a 
height together of a said object. 

25 Having a construction as mentioned above, the sound source 

identifying and separating apparatus of the present invention in localizing 
the locations of sound sources according to the sound information acquired 
from the sound collecting microphones is designed to narrow the directions 
of the sound sources with reference to the position information based on 

30 the information as to image pictures imaged by the imaging means and the 
information as to the directions acquired by the sensing means. 
Accordingly, the sound source identifying and separating apparatus of the 
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present invention is made able to specify the objects that can be the sound 
sources by using animated image pictures and directional information of 
the objects and at the same time to individually separate the sound sources 
reliably by using their position information and sound information. 
5 In order to achieve the second object mentioned above, there is also 

provided in accordance with the present invention a sound source 
identifying and separating method, which is characterized in that it 
comprises: a first step of capturing mixed sounds from a plurality of sound 
sources with a pair of sound collecting microphones juxtaposed with each 

10 other across a preselected spacing and opposed to the sound sources, the 
said two sound collecting microphones each individually capturing the said 
mixed sounds from the said sound sources; conducted concurrently with the 
first step, a second step in which an imaging means consecutively images 
objects that can be the said sound sources to produce image pictures 

15 thereof and/or a sensing means senses directions in which the said objects 
are located; a third step in which a sound processing means determines a 
rough direction of each of all the said sound sources from sound 
information for the said mixed sounds captured in the first step and on the 
basis of information in the said sound information for a difference between 

20 (two) phases and a difference between (two) intensities (which two phases 
and intensities each of the said mixed sounds from the said sound sources 
has when captured in the first step by the said two sound collecting 
microphones, respectively); a fourth step in which an image processing 
means determines a direction of each of the said objects possibly being all 

25 the said sound sources from information for the image pictures produced 
and/or information for the direction sensed in the second step, within a 
range defined by such rough directions determined in the third step; a fifth 
step in which the said sound processing means determines a direction of 
each of all the said sound sources on the basis of the said sound 

30 information for a difference between (two) phases and a difference between 
(two) intensities (which two phases and intensities each of the said mixed 
sounds from the said sound sources has when captured in the first step by 
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the said two sound collecting microphones, respectively), within a range of 
angles defined by such directions determined in the fourth step; a sixth step 
in which the said sound processing means selects a particular directional 
filters in accordance with the direction determined in the fifth step of each 
5 of the all said sound sources to separate all the said sound sources from one 
another; a seventh step in which the said image processing means 
determines a direction of each of all the said objects possibly being the 
said sound sources on the basis of information for the image pictures 
produced and/or information for the direction sensed by the said sensing 

10 means in the second step, and the said sound processing means determines 
a direction of each of all the said sound sources on the basis of the said 
sound information for a difference between phases and a difference 
between intensities as aforesaid within a range of angles defined by thus 
determined directions, and selects a particular directional filter in 

15 accordance with the thus determined direction of each of all said source to 
isolate all the said sound sources from one another; and an eight step in 
which the said sound processing means selects such particular filters in 
accordance with such rough directions determined in the third step to 
separate all the said sound sources from one another. 

2 0 Also, the method is characterized in that the direction of each of all 

the said objects possibly the being said sound sources is determined by the 
said image processing means on the basis of any one or more or all of a 
color, a shape and a height thereof. 

Also, the method is characterized in that the direction sensing by 

25 the said sensing means is effected in response to an infrared ray. 

Also, the method is characterized in that the direction sensing by 
the said sensing means is effected in response to magnetism. 

Also, the method is characterized in that the direction of each of all 
the said objects possibly being the said sound sources is determined by the 

30 said image processing means on the basis of a color thereof. 

Also, the method is characterized in that the direction of each of all 
the said objects possibly being the said sound sources is determined by the 
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said image processing means on the basis of a shape thereof. 

Also, the method is characterized in that the direction of each of all 
the said objects possibly being the said sound sources is determined by the 
said image processing means on the basis of a color, a shape and a height 
5 thereof 

Also, the method is characterized in that determination of the 
direction of each of all the said sound sources by the said sound processing 
means on the basis of sound information for a difference between phases 
and a difference between intensities is effected by determining a position 

10 of each of said sources on the basis of a signal for each of frequency bands 
arbitrarily divided into. 

Also, the method is characterized in that the said position 
information of a said object possibly being a said sound source is derived 
from a movement of the said object. 

15 Organized as mentioned above, the sound source identifying 

method according to the present invention permits not only sound 
information of a plurality of sound sources to be derived from a sound 
collecting means made of the two sound collecting microphones opposed to 
the sound sources, but also image information of these sound sources to be 

20 derived from image pictures thereof imaged by an imaging means. Further, 
sensing the directions of the sound sources by magnetism or an infrared ray 
gives rise to direction sensing information. And, when the sound 
processing means is localizing the locations of the sound sources based on 
sound information, e.g., on the basis of a difference between phases and a 

2 5 difference between intensities in sound information acquired by the sound 
collecting microphones for each of the sound sources, the direction of each 
of the sound sources is narrowed with reference to position information 
derived for each of objects possibly being the sound sources by an image 
processing means, e.g., from its color, shape and/or movement based on 

30 either or both of the direction sensing information and the image 
information derived from the imaging means, thereby permitting the sound 
sources to be localized as to their locations on the basis of signals in 
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various frequency bands, e.g., harmonic structures. Consequently, the 
method makes it unnecessary to process the sound information 
omnidirectionally or over all the directions in identifying the sound 
sources, makes it possible to identify the sound source with greater 
5 certainty, makes a lesser amount of processable information sufficient and 
makes it possible to reduce the time for processing. 

In this case, the ability to identify three or more sound sources with 
two sound collecting microphones in the sound collecting means makes it 
possible to effect accurate identification of the positions of sound sources 

10 in simple construction. 

Also, if the method is so conducted as set forth above that the third 
step is included of deriving information as to rough locations of the sound 
sources only from the sound information of the sounds collected in the first 
step and that the fourth step includes narrowing in advance directions of 

15 the sound sources based on the rough position information derived in the 
third step, thereby deriving the position information of the objects possibly 
being the sound sources, then there results a reduction in the amount of 
information for processing in deriving the position information of the 
objects possibly being the sound sources based on the image information in 

20 the third step, which simplifies the processing. 
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from the microphones in the sound collecting means 11 and the position 
information A3, B3 and C3 derived by the image processing means 13. 

In the identification of the positions of the sound sources, the sound 
information may be based on a difference in phase and a difference in 
5 intensity between two pieces of sound information received by the right 
hand side and left hand side sound collecting microphones 11a and lib, 
respectively. 

Thus in deriving sound information from a given sound source, as 
shown in Fig. 4 use may be made here of the fact that changing as a 

10 function of the direction 6 in which a sound from the sound source 
propagates arriving at the two sound collecting microphones 11a and lib 
(6 = 0 when the sound source is in the front, a minus when it is in the left 
and a plus when it is in the right of them), a difference d between distances 
from the sound source to the two microphones 11a and lib (expressed by 

15 equation: d = D sin 9 ) causes the sound to vary in phase and also by 
damping to vary in intensity as it arrives them. 

Further, because the location of the sound source is not clear as yet, 
the sound processing means 14 here effects processing as mentioned above 
over the entire ranges of angles: -90 degrees^ 6 ^+90 degrees. In this 

2 0 case, the processing operation may be lightened by, for example, 

processing every angular interval, e.g., 5 degrees of 9 . 

The sound processing means 14 first selects or determines rough 
directions AO, BO and CO of the sound sources based on sound information 
left and right from the sound collecting means 11. This follows the 
25 conventional sound source identifying technique (see JP 9-33330 A) 
yielding an accuracy of ±10 degrees. 

And, the sound processing means 15 outputs these rough directions 
AO, BO and CO for entry into the image processing means 13. 

Further, the sound processing means 14 with reference to the 

3 0 position information A3, B3 and C3 entered therein from the image 

processing means 13 localizes the locations of the sound sources based 
again on the sound information narrowed into the ranges of the position 
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sources A, B and C according to the sound information received from the 
sound collecting means within a given range of angles for the position 
information A3, B3 and C3 received from the image processing means 14. 

Finally, in step ST6 the sound processing means 14 selects a 
5 particular directional filter to selectively extract sound information of a 
same sound from a same sound source and with a particular time delay. In 
this manner, it will be seen that the sound source identifying apparatus 10 
according to the illustrated form of embodiment of the present invention in 
which in identifying a sound source, the sound processing means 14 is 
10 made to operate based not only on sound information received from the 
sound collecting means 11 but also on an image picture imaged by the 
imaging means 12, thus while referring to position information A3, B3, C3 
of an object that can be the sound source, has the ability to identify a sound 
source with an accuracy increased from that of around ±10 degrees 
15 attainable with the conventional system in which only sound information 
from the sound collecting means 11 is based on. 

It is further seen that enhancing the accuracy of localizing the 
location of a sound source by refining sound information that beforehand 
roughly separates the sound source from another sound source with 
20 position information derived from image information makes its 
identification reliable even if they are close to each other. 

More specifically, if three talkers as sound sources are imaged by 
the imaging means 12 consecutively, for example image pictures are 
obtainable as shown in Fig. 9 in which they are of the 7'^ 51'', 78'^ and 
25 158*^ frames of all the pictures consecutively imaged. 

Here, these talkers' faces are actually lying as shown in Fig. 10(A) 
from which it is apparent that the talkers are positioned at around -30 
degrees, 0 degree and +20 degrees of directional angle, respectively. 

Then, if determination is made to locate these objects possibly as 
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What is claimed is: 

1. (Amended) A sound source identifying and separating apparatus, 
characterized in that it comprises: 
5 a sound collecting means including a pair of sound collecting 

microphones juxtaposed with each other across a preselected spacing and 
opposed to a plurality of sound sources, said two microphones each 
individually capturing mixed sounds from said sound sources therewith; 

an imaging means and/or a sensing means, said imaging means 
10 being adapted to consecutively image objects that can be said sound 
sources, said sensing means sensing directions in which said objects 
possibly being said sound sources are located; 

a sound processing means for determining the directions of all said 
sound sources based on sound information for a difference between phases 
15 and a difference between intensities which each of said mixed sounds from 
said sound sources has when captured by said two sound collecting 
microphones, respectively; 

an image processing means for determining the direction of each of 
said objects possibly being said sound sources, from information for image 
20 pictures imaged by said imaging means and/or directional information for 
each of said objects sensed by said sensing means; 

directional filters; and 

a control means for controlling operations of said sound collecting 
means, said imaging means and/or said sensing means, said image 
25 processing means, and said sound processing means, 

wherein the operations of said sound collecting means, said 
imaging means and/or said sensing means, said image processing means, 
and said sound processing means, are so controlled by said control means 
that: 

30 said sound processing means predetermines rough directions of said 

sound sources from information for said sounds captured by said sound 
collecting means, and said image processing means determines the 
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direction of each of said objects possibly being said sound sources within a 
range defined by said predetermined rough directions; or 

said image processing means predetermines directions of said sound 
sources only from information for image pictures imaged by said imaging 
5 means and/or directional information for each of said objects sensed by 
said sensing means, and said sound processing means determines the 
directions of said sound sources within a range of angles defined by said 
predetermined directions; or 

said sound processing means predetermines rough directions of said 
10 sound sources only from information for said sounds captured by said 
sound collecting means, and said sound processing means selects said 
directional filters corresponding to said predetermined directions of said 
sound sources, 

whereby it is made possible to identify the directions of all said 
15 sound sources and to separate them from one another even if neighboring 
ones of them lie close by. 

2. (Amended) A sound source identifying and separating apparatus 
as set froth in claim 1, characterized in that said sensing means is adapted 

20 to sense said objects possibly being said sound sources in response to 
magnetism thereof. 

3. (Amended) A sound source identifying and separating apparatus 
as set forth in claim 1, characterized in that said sensing means is adapted 

25 to sense said objects possibly being said sound sources in response to 
infrared rays that they emit. 

4. (Amended) A sound source identifying and separating apparatus 
as set forth in claim 1 or claim 2, characterized in that said objects possibly 

30 being said sound sources have each a magnetic carrying material attached 
thereto. 
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5. (Amended) A sound source identifying and separating apparatus 
as set forth in any one of claim 1 to claim 4, characterized in that said 
image processing means has a function to determine direction of all said 
objects possibly being said sound sources on the basis of a color of a said 

5 object. 

6. (Amended) A sound source identifying and separating apparatus 
as set forth in any one of claim 1 to claim 4, characterized in that said 
image processing means has a function to determine directions of all said 

10 objects possibly being said sound sources on the basis of a shape of a said 
object. 

7. (Amended) A sound source identifying and separating apparatus 
as set forth in any one of claim 1 to claim 4, characterized in that said 

15 image processing means has a function to determine direction of all said 
objects possibly being said sound sources on the basis of a color, a shape 
and a height together of a said object. 

8. (Amended) A sound source identifying and separating method, 
2 0 characterized in that it comprises: 

a first step of capturing mixed sounds from a plurality of sound 
sources with a pair of sound collecting microphones juxtaposed with each 
other across a preselected spacing and opposed to the sound sources, said 
two sound collecting microphones each individually capturing said mixed 

25 sounds from said sound sources; 

conducted concurrently with the first step, a second step in which 
an imaging means consecutively images objects that can be said sound 
sources to produce image pictures thereof and/or a sensing means senses 
directions in which said objects are located; 

30 a third step in which a sound processing means determines a rough 

direction of each of all said sound sources from sound information for said 
mixed sounds captured in the first step and on the basis of information in 
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said sound information for a difference between phases and a difference 

between intensities; 

a fourth step in which an image processing means determines a 

direction of each of said objects possibly being all said sound sources from 
5 information for the image pictures produced and/or information for the 

direction sensed in the second step, within a range defined by such rough 

directions determined in the third step; 

a fifth step in which said sound processing means determines a 

direction of each of all said sound sources on the basis of said sound 
10 information for a difference between phases and a difference between 

intensities, within a range of angles defined by such directions determined 

in the fourth step; 

a sixth step in which said sound processing means selects a 

particular directional filters in accordance with the direction determined in 
15 the fifth step of each of all said sound sources to separate all said sound 

sources from one another; 

a seventh step in which said image processing means determines a 

direction of each of all said objects possibly being said sound sources on 

the basis of information for the image pictures produced and/or 
20 information for the direction sensed by said sensing means in the second 

step, and said sound processing means determines a direction of each of all 

said sound sources on the basis of said sound information for a difference 

between phases and a difference between intensities as aforesaid within a 

range of angles defined by thus determined directions, and selects a 
25 particular directional filter in accordance with the thus determined 

direction of each of all said source to separate all said sound sources from 

one another; and 

an eight step in which said sound processing means selects such 
particular filters in accordance with such rough directions determined in 
30 the third step to separate all said sound sources from one another, 

9. (Amended) A sound source identifying and separating method as 
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set forth in any one of claim 7 to claim 13, characterized in that the 
direction sensing by said sensing means is effected in response to an 
infrared ray. 



5 10. (Amended) A sound source identifying and separating method 

as set forth in any one of claim 7 to claim 13, characterized in that the 
direction sensing by said sensing means is effected in response to 
magnetism. 

10 11. (Amended) A sound source identifying and separating method 

as set forth in claim 8, characterized in that the direction of each of all said 
objects possibly being said sound sources is determined by said image 
processing means on the basis of a color thereof. 

15 12. (Amended) A sound source identifying and separating method 

as set forth in claim 8, characterized in that the direction of each of all said 
objects possibly being said sound sources is determined by said image 
processing means on the basis of a shape thereof. 

2 0 13. (Amended) A sound source identifying and separating method 

as set forth in claim 8, characterized in that the direction of each of all said 
objects possibly being said sound sources is determined by said image 
processing means on the basis of a color, a shape and a height thereof. 

25 14. (Amended) A sound source identifying method as set forth in 

any one of claim 8 to claim 13, characterized in that determination of the 
direction of each of all said sound sources by said sound processing means 
on the basis of sound information for a difference between phases and a 
difference between intensities is effected by determining a position of each 

30 of said sources on the basis of a signal for each of frequency bands 
arbitrarily divided into. 
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15. (Amended) A sound source identifying method as set forth in 
claim 8, characterized in that said position information of a said object 
possibly being a said sound source is derived from a movement of said 
object. 
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SPECIFICATION 

Sound Source IdeatifyiirgL AnmiiMus,^^ , 

5 Technical Field 

The present invention relates to a sound source identifying 
apparatus and method for identifying various sounds individually based on 
image information and sound information derived from a plurality of such 
sound sources. 

10 

Background Art 

Researches have so far been undertaken to separate from mixed 
sounds a particular sound such as a voice or a music sound included in the 
mixed sounds. For example, a sound recognition system has been known 
15 that assumes its input sound to be a speech or voices. And, insofar as image 
or image processing is concerned, a system has been known which in 
educing an object assumes its color, shape and/or movement to characterize 
it. 

There has so far been no sound recognition system, however, that 
20 associates sound recognition with image processing. On the other hand, the 
system assuming a speech or voices is only effectuated when a microphone 
is near the mouth or where there is no other sound source. 

Further, while there is a system proposed to separate based on a 
harmonic structure, a particular sound signal from those from a plurality of 
25 sound sources and then to find the direction in which its sound source is 
located, the accuracy with which the direction of the sound source can be 
found thereby is as rough as ± 10'' , and it is not possible to separate the 
sound source if it lies close to an adjacent sound source or sources. 

There has also been proposed a method that uses a plurality of 
30 sound collecting microphones the same in number as sound source and, 
based on sound information from the various sound collecting microphones, 
to identify a particular sound source. While this method is designed to 



identify the intensity of a sound and the location of its source, its 
frequency information comes to spread about the axis defining the 
direction in which the sound source is located, thereby making it difficult 
to finely identify the sound source. Further, while this method makes it 
5 possible to increase the rate of recognition of a sound source, the 
requirement for sound collecting microphones the same in number as sound 
sources existing independently of one another makes the method costly. 

Aimed to obviate the difficulties entailed in the prior art as 
described above, the present invention has for its first object to provide a 

10 sound source identifying apparatus that is capable of identifying an object 
as a source of a sound in mixed sounds in terms of its location with greater 
accuracy by using both information as to the sound and information as to 
the sound source as an image thereof and using information as to that 
position to separate the sound from the mixed sounds with due accuracy. 

15 The present invention further has for its second object to provide a 

sound source identifying method that is capable of identifying an object as 
a source of a sound in mixed sounds in terms of its location with greater 
accuracy by using both information as to the sound and information as to 
the sound source as an image thereof and using information as to that 

2 0 position to separate the sound from the mixed sounds with due accuracy. 

Disclosure of the Invention 

In order to achieve the first object mentioned above, there is 
provided in accordance with the present invention a sound source 

25 identifying apparatus, which apparatus comprises: a sound collecting 
means for capturing sounds from a plurality of sound sources with a pair of 
sound collecting microphones juxtaposed with each other across a 
preselected spacing and opposed to the sound sources and for processing 
the captured sounds; either or both of an imaging means and a sensing 

30 means, the imaging means being adapted to consecutively image objects 
that can be the said sound sources, the said sensing means being for 
sensing the said objects possibly being the said sources; an image 



-2- 



processing means for deriving information as to locations of the said 
objects possibly being the said sound sources, from either or both of image 
pictures imaged by the said imaging means and directional information of 
the said objects sensed by the said sensing means; a sound processing 
5 means for localizing the positions of the said sound sources based on sound 
information of the said sounds captured by the said sound collecting means 
and position information derived by the said image processing means; and 
a control means for controlling operations of the said sound collecting 
means, the said imaging means and/or the said sensing means, the said 
10 image processing means, and the said sound processing means. 

Further, in addition to the construction mentioned above, the said 
sound processing means preferably includes directional filters, each of 
which is adapted to extract sound information at a particular time instant 
selectively. 

15 The said sound processing means preferably has a function to 

derive information as to rough directions in which said objects possibly 
being the sound sources are located. 

The said sensing means preferably is adapted to sense the said 
objects possibly being the said sound sources in response to magnetism 

20 thereof or an infrared ray therefrom. 

Preferably, the said objects possibly being the said sound sources 
have each a material carrying magnetism attached thereto. 

Having a construction as mentioned above, the sound source 
identifying apparatus of the present invention in localizing the locations of 

25 sound sources according to the sound information acquired from the sound 
collecting microphones is designed to narrow the directions of the sound 
sources with reference to the position information based on the information 
as to image pictures imaged by the imaging means and the information as 
to the directions acquired by the sensing means. 

30 Accordingly, the sound source identifying apparatus of the present 

invention is made able to specify the each object that can be the sound 
sources by using dynamic image pictures and directional information of the 
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objects and at the same time to individually separate the sound sources 
reliably by using their position information and sound information. 

In order to achieve the second object mentioned above, there is also 
provided in accordance v^ith the present invention a sound source 
5 identifying method, which comprises: a first step of capturing sounds from 
a plurality of sound sources v^ith a pair of sound collecting microphones 
juxtaposed with each other across a spacing and opposed to the sound 
sources and processing the captured sounds, in a sound collecting means; a 
second step, conducted concurrently with the first step, of consecutively 

10 imaging objects that can be the said sound sources and/or sensing 
directions in which the said objects are located; a third step of deriving 
information as to locations of the said objects possibly being the said 
sound sources, from either or both of image pictures imaged, and the 
directions sensed, in the second step; and a fourth step of localizing 

15 locations of the said sound sources based on sound information of the 
sounds collected in the first step and position information derived in the 
third step. 

The sound source identifying method according to the present 
invention preferably further includes a fifth step of deriving information as 

20 to rough locations of the said sound sources only from the sound 
information of the said sounds collected in the said first step, wherein the 
said third step includes narrowing in advance directions of the said sound 
sources based on the rough position information derived in the said fifth 
step, thereby deriving the said position information of the said objects 

25 possibly being the said sound sources. 

Preferably in the sound source identifying method according to the 
present invention, the said fifth step roughly derives the directions of the 
said sound sources from a difference between phases and a difference 
between intensities of each of the said sounds acquired by the said sound 

30 collecting microphones. 

Preferably in the sound source identifying method according to the 
present invention, the said position information of the said objects possibly 
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being the said sound sources is derived in the said third step on the basis of 
either or both of a color and a shape of a said object. 

In the sound source identifying method according to the present 
invention, the said fourth step preferably localizes the locations of the said 
5 sound sources by selecting particular preset directional filters in response 
to the position information derived in the said third step. 

Preferably in the sound source identifying method according to the 
present invention, the locations of the said sound sources are localized in 
the said fourth or fifth step on the basis of a signal in each of frequency 
10 bands arbitrarily divided into based on the sound information obtained in 

the said first step. 

Preferably in the sound source identifying method according to the 
present invention, the said position information of a said object possibly 
being a said sound source is derived from a movement of the said object, 
15 Preferably in the sound source identifying method according to the 

present invention, a said direction is sensed in response to magnetism or an 
infrared ray. 

Organized as mentioned above, the sound source identifying 
method according to the present invention permits not only sound 

20 information of a plurality of sound sources to be derived from a sound 
collecting means made of the two sound collecting microphones opposed to 
the sound sources, but also image information of these sound sources to be 
derived from image pictures thereof imaged by an imaging means. Further, 
sensing the directions of the sound sources by magnetism or an infrared ray 

25 gives rise to direction sensing information. And, when the sound 
processing means is localizing the locations of the sound sources based on 
sound information, e.g., on the basis of a difference between phases and a 
difference between intensities in sound information acquired by the sound 
collecting microphones for each of the sound sources, the direction of each 

30 of the sound sources is narrowed with reference to position information 
derived for each of objects possibly being the sound sources by an image 
processing means, e.g., from its color, shape and/or movement based on 
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either or both of the direction sensing information and the image 
information derived from the imaging means, thereby permitting the sound 
sources to be localized as to their locations on the basis of signals in 
various frequency bands, e.g., harmonic structures. Consequently, the 
5 method makes it unnecessary to process the sound information 
omnidirectionally or over all the directions in identifying the sound 
sources, makes it possible to identify the sound source with greater 
certainty, makes a lesser amount of processable information sufficient and 
makes it possible to reduce the time for processing, 

10 In this case, the ability to identify three or more sound sources with 

two sound collecting microphones in the sound collecting means makes it 
possible to effect accurate identification of the locations of sound sources 
in simple construction. 

Also, if the method is so conducted as set forth above that the fifth 

15 step is included of deriving information as to rough locations of the sound 
sources only from the sound information of the sounds collected in the first 
step and that the third step includes narrowing in advance directions of the 
sound sources based on the rough position information derived in the fifth 
step, thereby deriving the position information of the objects possibly 

20 being the sound sources, then there results a reduction in the amount of 
information for processing in deriving the position information of the 
objects possibly being the sound sources based on the image information in 
the third step, which simplifies the processing. 

If the method is so conducted as set forth above that the fourth step 

25 localizes the locations of the sound sources by selecting particular preset 
directional filters in response to the position information derived in the 
third step to extract sound information from each of the sound sources, 
having each of the directional filters preset for extracting sound 
information from each of the sound sources located in the corresponding 

30 direction permits the processing to localize the locations of all the sound 
sources to go on smoothly. 
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Brief Description of the Drawings 

The present invention will better be understood from the following 
detailed description and the drawings attached hereto showing certain 
illustrative forms of embodiment of the present invention. In this 
connection, it should be noted that such forms of embodiment illustrated in 
the accompanying drawings hereof are intended in no way to limit the 
present invention but to facilitate an explanation and understanding 
thereof. 

In the drawings: 

Fig. 1 is a diagrammatic view illustrating the makeup of a first form 
of embodiment of the sound source identifying apparatus according to the 
present invention; 

Fig. 2 is a diagrammatic view of an exemplary image picture taken 
or imaged by an imaging means in the sound source identifying apparatus 
shown in Fig. 1; 

Fig. 3 is an explanatory view for the image picture in the sound 
source identifying apparatus of Fig. 1 in which (A) shows rough directions 
AO, BO and CO of sound sources determined by a sound processing means, 
(B) shows frames Al, Bl and CI of objects possibly as the sound sources 
determined by an image processing means, and (C) shows pieces of 
position information A3, B3 and C3 of the objects possibly as the sound 
sources determined by the image processing means; 

Fig. 4 is an explanatory view illustrating a difference in distance 
between a sound source and two sound collecting microphones included in 
a sound collecting means in the sound source identifying apparatus of Fig. 

1; 

Fig. 5 is a graph illustrating an operation of a directional filter 
included in the sound processing means in the sound source identifying 
apparatus of Fig. 1; 

Fig. 6 is a graph illustrating the extraction of two pieces of sound 
information for a sound from a single sound source performed in the sound 
processing means in the sound source identifying apparatus of Fig. 1; 
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Fig. 7 is an explanatory view illustrating the extraction of sound 
information from each sound source performed by the directional filter in 
the sound processing means in the sound source identifying apparatus of 
Fig. 1; 

5 Fig, 8 is a flow chart illustrating a method of operating the sound 

source identifying apparatus of Fig. 1; 

Fig, 9 is a pictorial diagram illustrating a portion of consecutive 
image pictures taken by the imaging means in the sound source identifying 
apparatus of Fig. 1; and 
10 Fig. 10 is a graph illustrating information as to positions 

determined by the image processing means on a variety of bases of an 
object that can be a sound source in the sound source identifying apparatus 
of Fig. 1. 

15 Best Modes for Carrying Out t he Invention 

Hereinafter, the present invention for a sound source identifying 
apparatus and method will be described in detail with respect to presently 
best forms of embodiments thereof illustrated in the drawing figures. 

Fig. 1 shows a form of embodiment of the sound source identifying 
20 apparatus according to the present invention. 

Referring to Fig. 1, the sound source identifying apparatus 10 
includes a sound collecting means 11, an imaging means 12, an image 
processing means 13, a sound processing means 14 and a control means 15. 
The sound collecting means 11 is designed to capture sounds from a 
25 plurality of sound sources, for example, three talkers, with a pair of sound 
collecting microphones 11a and lib juxtaposed with each other across a 
preselected spacing D as indicated in Fig. 1 and opposed to the sound 
sources and to process the captured sounds. While the disposition of these 
sound collecting microphones may be set in any suitable manner, in the 
30 example shown they are provided at opposite sides of the imaging means 
12, namely at its right hand and left hand sides. 

The imaging means 12 is constituted, for example, of a CCD 
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(charge coupled device) camera and is designed as shown in Fig. 2 to make 
image pictures of a plurality of sound sources, e.g., three talkers A, B, and 
C, consecutively. 

The image processing means 13 is designed to derive information as 
5 to the locations of objects that can be sound sources, in images taken by the 
imaging means 12, and based on their color, shape or movement. It should 
be noted here that the term "movement" is intended to include vibrations. 

In this case, the image processing means 13 sets up in and for the 
image picture taken by the imaging means 12, frames Al, Bl and CI for the 

10 three talkers A, B and C according to the color (i.e., the color of the skin of 
a human being) and height as shown in Fig. 3(B). Then, as shown in Fig. 
3(C) the image processing means 13 selects center points A2, B2 and C2 of 
these frames Al, Bl and CI (indicated in Fig. 3 by the + marks) as the 
respective locations of the objects to be possibly sound sources and takes 

15 their respective horizontal coordinates A3, B3 and C3 as information of 
these positions. 

At this point, it should be noted that the reason why the words 
"objects" "that can be", "possibly being" or "possibly as" "sound sources" 
are used here is that it has not necessarily be clear as yet from image 

20 recognition alone if they are indeed sound sources or not. 

Preferably, in order to simplify the above image processing, the 
image processing means 13 prior thereto should have rough directions AO, 
BO and CO of these sound sources (see Fig, 3(A)) entered therein that are 
determined by the sound processing means 14 to be described in detail 

25 below. Thus, having narrowed the respective regions of image processing 
of the sound sources into the rough directions AO, BO and CO, the image 
processing means 13 derives information A3, B3 and C3 as to the 
respective locations as mentioned above of the objects that can be the 
sound sources by effecting the image processing within the narrowed 

30 regions of the rough directions AO, BO and CO. 

The sound processing means 14 is designed to localize the locations 
of the sound sources based, for example, on the sound information derived 
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from the microphones in the sound collecting means 11 and the position 
information A3, B3 and C3 derived by the image processing means 13. 

In the identification of the positions of the sound sources, the sound 
information may be based on a difference in phase and a difference in 
5 intensity between two pieces of sound information received by the right 
hand side and left hand side sound collecting microphones 11a and lib, 
respectively. 

Thus in deriving sound information from a given sound source, as 
shown in Fig. 4 use may be made here of the fact that changing as a 
10 function of the direction 0 in which a sound from the sound source 
propagates arriving at the two sound collecting microphones 11a and lib 
(0 = 0 when the sound source is in the front, a minus when it is in the left 
and a plus when it is in the right of them), a difference d between distances 
from the sound source to the two microphones 11a and lib (expressed by 
15 equation: d = D sin 0 ) causes the sound to vary in phase and also by 
damping to vary in intensity as it arrives them. 

Further, because the location of the sound source is not clear as yet, 
the sound processing means 14 here effects processing as mentioned above 
over the entire ranges of angles: -90 degrees^ 0 ^+90 degrees. In this 
20 case, the processing operation may be lightened by, for example, 
processing every angular interval, e.g., 5 degrees of 0 . 

The sound processing means 15 first selects or determines rough 
directions AO, BO and CO of the sound sources based on sound information 
left and right from the sound collecting means 11. This follows the 
25 conventional sound source identifying technique yielding an accuracy of 
± 10 degrees. 

And, the sound processing means 14 outputs these rough directions 
AO, BO and CO for entry into the image processing means 13. 

Further, the sound processing means 14 with reference to the 
30 position information A3, B3 and C3 entered therein from the image 
processing means 13 localizes the locations of the sound sources based 
again on the sound information narrowed into the ranges of the position 
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information A3, B3 and C3, namely in the ranges of the position 
information A3, B3 and C3, 

In this case, the sound processing means 14 localizes the locations 
of the sound sources by making an appropriate choice of what are called 
directional filters for the sound sources A, B and C, respectively. 

Here, prepared so as to selectively extract sound information only 
at a particular time tO such directional filters are stored as in a control 
table for the directions of the sound sources in an auxiliary storage means 
(not shown) in the control means 15, and are identified and selected as 
appropriate by the sound processing means 14 from the auxiliary storage 
means based on the position information A3, B3 and C3 from the image 
processing means 13. 

This permits pieces of sound information emitted concurrently from 
sound sources and collected by the sound collecting microphones Ua and 
1 lb to be acquired when as shown in Fig. 6 a piece of sound information is 
given in the right hand side at time tl and then another piece of 
information in the left hand side is taken out at time t2 that is after a delay 
time A t following it (t2 = tl + At). Note, however, that A t can yet be 
negative. 

In this way, the selection of a particular directional filters by the 
sound processing means 14 with respect to each of the sound sources A, B 
and C possessing a directional information that is accurate to a certain 
extent enables their respective pieces of sound information to be obtained 
from the mixture of sounds as shown in Fig. 7. 

It should be noted at this point that narrowing the respective ranges 
of the directions of the sound sources by the pieces of position information 
A3, B3 and C3 makes it unnecessary for the sound processing means 14 to 
conduct processing over the entire range of angles for 6 (- 90 degrees 
^ 0 ^ +90 degrees) and makes it sufficient for the same to process a 
certain narrowed range of angles about the pieces of position information 
A3, B3 and C3. 

The control means 15 that may, for example, be comprised of a 



computer is designed to control the operations of the sound collecting 
means 11, the imaging means 12, the image processing means 13 and the 
sound processing means 14, The control means 15 as mentioned above has 
the directional filters stored as preset in the auxiliary storage means (not 
5 shown) therein. 

Constructed as mentioned above, the sound source identifying 
apparatus 10 according to the present form of embodiment operates as 
described below, in accordance with the flow chart shown in Fig. 8, 

Referring to Fig. 8, in step STl the control means 15 acts on the 

10 sound collecting means 11 to cause each of the sound collecting 
microphones 11a and lib to collect sound from sound sources A, B and C, 
while in the mean time the control means 15 also acts on the imaging 
means 12 to cause it to image the sound sources consecutively in step ST2. 
Next, in step ST3 the control means 15 acts on the sound processing 

15 means 14 to cause it to select or determine rough directions AO, BO and CO 
in which the sound sources are located, respectively (see Fig. 3(A)), based 
on pieces of sound information for a difference between the phases and a 
difference between the intensities that the sound from each of the sound 
sources has as it is collected by the two microphones, respectively, in the 

20 sound collecting means 11, Then, all the harmonic structures in which any 
phase difference exists are examined to roughly separate the sound sources 
from the mixture sound. As a postscript, a harmonic structure is made a 
standard as an example of the signal for each of frequency bands arbitrarily 
divided into. 

25 Subsequently, in step ST4 the control means 15 acts on the image 

processing means 13 to cause it to select or determine position information 
A3, B3 and C3 (see Fig. 3(C)) as to objects as possible sound sources 
according to an color and/or shape thereof in image pictures received from 
the imaging means 12, and within the ranges of the rough directions 

30 received from the sound processing means 14, 

Thereafter, in step ST5 the control means 15 acts on the sound 
processing means 14 to cause it to localize the locations of the sound 
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sources A, B and C according to the sound information received from the 
sound collecting means within a given range of angles for the position 
information A3, B3 and C3 received from the image processing means 14. 

Finally, in step ST6 the sound processing means 14 selects a 
particular directional filter to selectively extract sound information of a 
same sound from a same sound source and with a particular time delay. 
Without processing sound information of another, erroneous harmonic 
structure, this reduces the error and increases the sound source separation 
efficiency. 

In this manner, it will be seen that the sound source identifying 
apparatus 10 according to the illustrated form of embodiment of the present 
invention in which in identifying a sound source, the sound processing 
means 14 is made to operate based not only on sound information received 
from the sound collecting means 1 1 but also on an image picture imaged by 
the imaging means 12, thus while referring to position information A3, B3, 
C3 of an object that can be the sound source, has the ability to identify a 
sound source with an accuracy increased from that of around ± 10 degrees 
attainable with the conventional system in which only sound information 
from the sound collecting means 11 is based on. 

It is further seen that enhancing the accuracy of localizing the 
location of a sound source by refining sound information that beforehand 
roughly separates the sound source from another sound source with 
position information derived from image information makes its 
identification reliable even if they are close to each other. 

More specifically, if three talkers as sound sources are imaged by 
the imaging means 12 consecutively, for example image pictures are 
obtainable as shown in Fig. 9 in which they are of the 5V\ 78*'' and 
15S^^ frames of all the pictures consecutively imaged. 

Here, these talkers' faces are actually lying as shown in Fig. 10(A) 
from which it is apparent that the talkers are positioned at around -30 
degrees, 0 degree and +20 degrees of directional angle, respectively. 

Then, if determination is made to locate these objects possibly as 



the sound sources by the image processing means 13 processing the images 
only on the basis of the color, it is seen as shown in the graph of Fig. 10(B) 
that various other objects in the image pictures are recognized, too, as 
sound sources by mistake. If, however, both the color and height were 
5 based on in the image processing, the mistake is seen to decrease as shown 

in the graph of Fig. 10(C). 

Further, if the image processing means 13 is caused to process the 
images based on the color only while referring to the rough directions AO, 
BO and CO received from the sound processing means 14, the mistake is 
10 seen to decrease still more as shown in the graph of Fig. 10(D). 

Yet further, if the image processing means 13 is caused to process 
the images based on both the color and height while referring to the rough 
directions AO, BO and CO received from the sound processing means 14, it 
is apparent that the sound sources can be determined as their position 
15 information with an accuracy that compares favorably with the actual face 
position shown in Fig. 10(A), that is with considerable certainty. 

While in the example mentioned above the horizontal coordinates 
A3,B3 and C3 of the center positions A2, B2 and C2 of the frames Al, Bl 
and CI in the pictures imaged consecutively of the objects that can be the 
20 sound sources are used to provide information as to locations thereof, use 
may be made of horizontal and vertical coordinates to provide information 
as to locations thereof. 

Further, in the example mentioned above the image processing 
means 13 is designed to select or determine information as to the locations 
25 of the objects that can be the sound sources on the basis of the color and 
shape (e.g., height) of the objects in the pictures imaged consecutively. 

Still further, while in the example mentioned above the image 
processing means 13 is designed to effect image processing with reference 
to the rough directions AO, BO and CO received from the sound processing 
30 means 14, the invention broadly is not limited thereto but may have 
information selected or determined as to the locations of the objects that 
can be the sound sources only on the basis of pictorial information received 
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from the imaging means 12. 

In order to detect the direction in which a sound source is located, 
an active element such as in the form of a badge carrying magnetism may 
be attached to the sound source to determine the direction in which the 
5 magnetism is emitted by using a magnetic sensing device as its detecting 
means. The direction detected by the magnetic sensing means may be fed 
back to the sound processing means and used by the latter to prepare a 
directional filter, thereby separating the sound source. 

In case the sound source is a person, its emission of a heat ray 

10 renders an infrared detector usable to detect the direction in which the 
sound source is located. 

As described in the foregoing, it is seen that the present invention 
according to which in identifying a sound source based on sound 
information the direction in which the sound source is located is narrowed 

15 based on information as to its image and information as to its located 
direction detected while with reference to information as to the location of 
an object that can be the sound source, makes it unnecessary to process the 
sound information omnidirectionally or over all the directions in 
identifying the sound source, makes it possible to identify the sound source 

20 with greater certainty, makes a lesser amount of processable information 
sufficient and makes it possible to reduce the time for processing. 
Accordingly, a highly advantageous sound source identifying apparatus 
and method are provided in accordance with the present invention, which 
make it possible to identify a plurality of sound sources with due accuracy 

25 by means of a pair of microphones. 

Although the present invention has hereinbefore been set forth with 
respect to certain illustrative forms of embodiments thereof, it will readily 
be appreciated to be obvious to a person skilled in the art that many 
alternations thereof, omissions therefrom and additions thereto can be 

30 made without departing from the essences of scope of the present invention. 
Accordingly, it should be understood that the invention is not intended to 
be limited to the specific forms of embodiment thereof set forth below, but 
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to include all possible forms of embodiment thereof that can be made 
within the scope with respect to the features specifically set forth in the 
appended claims and encompasses all the equivalents thereof. 

5 Industrial Applicability 

As will be appreciated from the foregoing description, a sound 
source identifying apparatus and method according to the present invention 
are highly useful as a sound source identifying apparatus and method 
whereby the location of an object as a sound source is identified with due 
10 certainty based on both sound and image information and the use of its 
position information permits each of such sound sources to be separated 
from mixed sounds with due accuracy. Claims: 
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What is claimed is: 



1. A sound source identifying apparatus, comprising: 

a sound collecting means for capturing sounds from a plurality of 
5 sound sources with a pair of sound collecting microphones juxtaposed with 
each other across a preselected spacing and opposed to the sound sources 
and for processing the captured sounds; 

either or both of an imaging means and a sensing means, the 
imaging means being adapted to consecutively image objects that can be 
10 said sound sources, said sensing means being for sensing said objects 
possibly being said sound sources; 

an image processing means for deriving information as to locations 
of said objects possibly being said sound sources, from either or both of 
image pictures imaged by said imaging means and directional information 
15 of said objects sensed by said sensing means; 

a sound processing means for localizing the locations of said sound 
sources based on sound information of said sounds captured by said sound 
collecting means and position information derived by said image 
processing means; and 
20 a control means for controlling operations of said sound collecting 

means, said imaging means and/or said sensing means, said image 
processing means, and said sound processing means. 

2. A sound source identifying apparatus as set forth in claim 1, 
25 characterized in that said sound processing means includes directional 

filters, each of which is adapted to extract sound information at a particular 
time instant selectively. 

3. A sound source identifying apparatus as set forth in claim 1 or 
30 claim 2, characterized in that said sound processing means has a function 

to derive information as to rough directions in which said objects possibly 
being the sound sources are located. 
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4. A sound source identifying apparatus as set froth in any one of 
claim 1 to claim 3, characterized in that said sensing means is adapted to 
sense said objects possibly being said sound sources in response to 

5 magnetism thereof. 

5. A sound source identifying apparatus as set forth in any one of 
claim 1 to claim 3, characterized in that said sensing means is adapted to 
sense said objects possibly being said sound sources in response to infrared 

10 rays that they emit. 

6. A sound source identifying apparatus as set forth in any one of 
claim 1 to claim 3, characterized in that said objects possibly being said 
sound sources have each a material carrying magnetism attached thereto. 

15 

7. A sound source identifying method, characterized in that it 
comprises: 

a first step of capturing sounds from a plurality of sound sources 
with a pair of sound collecting microphones juxtaposed with each other 
20 across a preselected spacing and opposed to the sound sources and for 
processing the captured sounds, in a sound collecting means; 

a second step, conducted concurrently with the first step, of 
consecutively imaging objects that can be said sound sources and/or 
sensing directions in which said objects are located; 
25 a third step of deriving information as to locations of said objects 

possibly being said sound sources, from either or both of image pictures 
imaged, and the directions sensed, in the second step; and 

a fourth step of localizing locations of said sound sources based on 
sound information of the sounds collected in the first step and position 
30 information derived in the third step. 

8. A sound source identifying method as set forth in claim 7, 
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characterized in that it further includes a fifth step of deriving information 
as to rough locations of said sound sources only from the sound 
information of said sounds collected in said first step; and that said third 
step includes narrowing in advance directions of said sound sources based 
5 on the rough position information derived in said fifth step, thereby 
deriving said position information of said objects possibly being said 
sound sources. 

9. A sound source identifying method as set forth in claim 8, 
10 characterized in that said fifth step roughly derives the directions of said 

sound sources from sound information for a difference between phases and 
a difference between intensities which each of said sounds has when 
acquired by said sound collecting microphones, respectively. 

10. A sound source identifying method as set forth in any one of 
15 claim 7 to claim 9, characterized in that said position information of said 

objects possibly being said sound sources is derived in said third step on 
the basis of either or both of a color and a shape of a said object. 

11. A sound source identifying method as set forth in claim 1, 
20 characterized in that said fourth step localizes the locations of said sound 

sources by selecting particular preset directional filters in response to the 
position information derived in said third step to extract sound information 
from each of said sound sources. 

25 12. A sound source identifying method as set forth in any one of 

claim 7 to claim 11, characterized in that the positions of said sound 
sources are determined in said fourth or fifth step on the basis of a signal in 
each of frequency bands arbitrarily divided into based on the sound 
information obtained in said first step. 

30 

13. A sound source identifying method as set forth in any one of 
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claim 7 to claim 9, claim 11 and claim 12, characterized in that said 
position information of a said object possibly being a said sound source is 
derived from a movement of said object. 



5 14, A sound source identifying method as set forth in any one of 

claim 7 to claim 13, characterized in that a said direction is sensed in 
response to magnetism. 

15, A sound source identifying method as set forth in any one of 
10 claim 7 to claim 13, characterized in that said direction is sensed in 
response to an infrared ray. 
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